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ABSTRACT 

The quadruplex forming G-rich sequences are un- 
evenly distributed throughout the human genome. 
Their enrichment in oncogenic promoters and 
telomeres has generated interest in targeting G- 
quadruplex (GQ) for an anticancer therapy. Here, we 
present a quantitative analysis on the conformations 
and dynamics of GQ forming sequences measured 
by single molecule fluorescence. Additionally, we re- 
late these properties to GQ targeting ligands and G4 
resolvase 1 (G4R1) protein binding. Our result shows 
that both the loop (non-G components) length and 
sequence contribute to the conformation of the GQ. 
Real time single molecule traces reveal that the fold- 
ing dynamics also depend on the loop composition. 
We demonstrate that GQ-stabilizing small molecules, 
A/-methyl mesoporphyrin IX (NMM), its analog, NMP 
and the G4R1 protein bind selectively to the parallel 
GQ conformation. Our findings point to the complex- 
ity of GQ folding governed by the loop length and 
sequence and how the GQ conformation determines 
the small molecule and protein binding propensity. 

INTRODUCTION 

Although the G-quadruplex (GQ) has long been thought 
to be an in vitro artifact, numerous recent studies point to 
the existence of GQ in vivo. Bioinformatics studies located 
GQ forming sequences in functional regions of the genome 
such as the transcription start site and telomeres, suggest- 
ing a potential regulatory role (1-4). GQ-binding ligands 
show that GQs located near promoter regions are directly 
involved in transcriptional regulation (5-7). Additionally, a 
recent genome-wide deep sequencing study identified that 
origins of replication are significantly associated with the 
GQ motifs (2). Moreover, the failure to resolve these struc- 
tures induce genomic instability, further supporting the for- 
mation of GQ structures in vivo (1). GQs have been ex- 



plicitly implicated in disease onset. The stable GQ struc- 
ture that arises from the hexanucleotide repeat expansion 
(HRE), (GGGGCC)^, was reported to be the most com- 
mon genetic cause of the neurodegenerative diseases such 
as amyotrophic lateral sclerosis (ALS) and frontotemporal 
dementia (FTD) (8,9). In mRNA, GQs have shown a trans- 
lation repression of a virus (8). 

Guanine rich single stranded DNA has a strong propen- 
sity to fold into GQ in vitro. The basic formula of 
[G3N1 7G3N1 7 G3N1 7 G 3 ] allows four sets of G triplets to 
form into three layers of G tetrads, mediated by the Hoog- 
steen base pairing (10). The GQ structures are stabilized 
by monovalent cations such as potassium or sodium. These 
ions occupy the central cavity created by the stacks of G 
tetrads (11-13). GQ DNA can fold into parallel, antiparal- 
lel and hybrid conformations depending on its loop length 
and sequence composition (14). Conventional techniques 
such as circular dichroism (CD) and thermal melting curves 
acquired through UV-visible spectroscopy are often used to 
distinguish GQ folding into parallel and antiparallel confor- 
mations (15). CD readings will provide either a characteris- 
tic peak at 260 nm for parallel or 295 nm for the antiparallel 
state. This allows for qualitative comparison among vari- 
ous GQ forming sequences (16). As demonstrated before, 
the single molecule FRET (smFRET) technique offers sev- 
eral advantages over ensemble methods. First, the fraction 
of molecules that fold into different conformations (paral- 
lel and antiparallel) can be quantified with accuracy. Sec- 
ond, unfolded DNA can be distinguished from folded con- 
formations. Third, the real-time imaging of single molecules 
allows for the monitoring of molecules undergoing transi- 
tions from one state to another, thus enabling kinetic analy- 
sis. This approach was applied in studies of telomeric DNA 
(17-19), modified GQ sequences in various solution condi- 
tions (18), GQ binding ligands (20) and protein interactions 
with the telomere overhang (21-23). 

By using the smFRET assay developed previously 
(17,24), we show that the conformations of GQ are mod- 
ulated by the loop length and sequence. We confirm that 



To whom correspondence should be addressed. Tel: +1 2172446703; Fax: +1 2172650246; Email: smyong@illinois.edu 
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.Org/licenses/by/3.0/), which 
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 



Nucleic Acids Research, 2014, Vol 42, No. 12 8107 



smaller loops promote a more parallel GQ structure, which 
agrees with previous reports (25,26). Additionally, our re- 
sults reveal new insights about the GQ folding patterns 
which is modulated by the length and composition of the 
loop sequence. We present a systematic analysis of GQ con- 
formation and dynamics governed by loop length and nu- 
cleotide composition and how these properties may mod- 
ulate loading of small molecules and proteins. Our results 
can provide a useful reference for assessing the GQ folding 
potential of various important genetic elements including 
promoter sequences. 

MATERIALS AND METHODS 

DNA sample preparation 

All oligonucleotides required to GQ DNA substrates were 
purchased from IDT with either Cy3 or Cy5 dyes (Table 
1). The complimentary DNA was modified with an amino 
modified C6 dT, eight bases from the 5' end and reacted 
with NHS-ester conjugated Cy5 (GE Healthcare). Ten mil- 
limolars dye was incubated with 0.1 mM DNA in 100 mM 
sodium tetraborate pH 8.5 buffer over 4-5 h. The excess 
dye was removed using Micro Bio-spin 6 column (Bio-Rad) 
twice. All GQ DNA constructs were annealed by mixing the 
3'Cy3 GQ containing DNA and the complementary Cy5 
labeled-3' biotinylated DNA at a molar ratio of 1 : 1 .5 in T50 
(10 mM Tris-HCl pH 7.5, 50 mM NaCl). The annealing re- 
action was performed by incubating at 95°C for 2 min then 
slowly cooling to room temperature for 2 h. 

Single molecule imaging buffers 

For single molecule imaging, 0.8 mg/ml glucose ox- 
idase, 0.625% glucose, ~3 mM 6-hydroxy-2, 5,7,8- 
tetramethylchromane-2-carboxylic (Trolox), and 0.03 
mg/ml catalase were added to the buffer (10 mM Tris-HCl 
pH 7.5 with or without 100 mM KC1). All CD measure- 
ments were carried out in the same basic buffer (10 mM 
Tris-HCl pH 7.5 and 100 mM KC1) at room temperature 
(23 =b 1°C). 

Single molecule fluorescence data acquisition 

Single molecule fluorescence experiments utilized quartz 
slides (Finkenbeiner) coated with polyethylene glycol 
(PEG) as described previously (27). Briefly, the slides and 
coverslips were cleaned with combination of methanol, 
acetone, potassium hydroxide and flame treatment. These 
slides were then coated with aminosilane followed by a mix- 
ture of 97.5% mPEG (m-PEG-5000, Laysan Bio, Inc.) and 
2.5% biotin PEG (biotin-PEG-5000, Laysan Bio, Inc). 

The annealed DNA molecules were immobilized on the 
PEG-passivated surface via biotin-neutravidin interaction. 
All experiments and measurements were carried out at 
room temperature (23 ± 1°C). Prism type total internal 
reflection microscopy was used to acquire single molecule 
FRET. A 532-nm Nd:YAG laser was guided through a 
prism to generate an evanescent field of illumination (27). 
Data was recorded with a time resolution of 100-200 ms 
and analyzed with custom scripts written in interactive data 
language (IDL) to give fluorescence intensity time trajecto- 
ries of individual molecules. 



smFRET data analysis 

Basic data analysis was carried out by scripts written in 
Matlab, with FRET efficiency, E, calculated as the intensity 
of the acceptor channel divided by the sum of the donor and 
acceptor intensities. FRET histograms were generated us- 
ing over 6000 individual molecules and were fitted to Gaus- 
sian distributions using Origin 8.0 (peak position left unre- 
strained). Dwell times were collected by measuring the time 
that each molecule spends in a particular FRET state. The 
means and the standard errors were plotted. Software for 
analyzing single-molecule FRET data is available for down- 
load from https://physics.illinois.edu/cplc/software/. 

Circular dichroism 

The CD spectra were recorded at room temperature (23 ± 
1°C) on a JASCO J-715 spectropolarimeter over the range 
of 200-320 nm using a 1-mm path length quartz cuvette 
with a reaction volume of 200 |xl. The GQ oligonucleotides 
concentration was 15 |xM. 

Steady state quenching measurements 

Bulk fluorescence quenching measurements were per- 
formed at room temperature in a standard buffer condition 
(10 mM Tris-HCl pH7.5, 100 mM KC1) with 50 nM of the 
previously mentioned GQ DNA minus the 3' biotin. Cy3 
Fluorescence excitation was set at 532 nm and emission was 
monitored at 572 nm. Bandwidths of both excitation and 
emission filter set at 10 nm. Fluorescence quenching was 
initiated with the small amount of GQ ligand and moni- 
tored with a fluorescence spectrophotometer (Cary Eclipse, 
Varian). Fluorescence quenching curves were fitted with a 
double exponential fit to establish a saturation point. Ad- 
ditionally, the hill coefficient was calculated for the quench- 
ing curves to determine the K& of each drug binding (shown 
below). The ligands utilized (NMM, NMP and NMMDE) 
were purchased from Frontier Scientific, Inc., UT, USA: 



where y is the percentage of fluorescence quenching and x 
is the small molecule drug concentration. 

G4 resolvase 1 purification 

Codon optimized cDNA of G4 resolvase 1 was pur- 
chased from GeneScript, Inc., NJ, USA. The cDNA 
was transformed into the BL21(DE3) Escherichia coli 
strain. Cells were grown at 37°C until OD (optical den- 
sity) reached 0.6. Then 0.6 mM IPTG (isopropyl-beta-D- 
thiogalactopyranoside) was added to the E. coli culture 
for induction and it was kept 14°C for overnight to reach 
OD of 1.2. The rest of protein purification followed pre- 
viously published protocol with minimal changes (28). C- 
MYC G4-DNA bound streptavidin paramagnetic beads 
(CGSPB) were prepared by adding to 3 OD of biotin c- 
MYC 51mer (Supplementary Table SI) DNA to 2 ml of 
MagnaBind magnetic beads from Thermo Scientific, USA. 
Recombinant G4R1 protein was initially purified by means 
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Table 1. DNA oligonucleotide for GQ constructs 



Name Sequence 5' to 3 f 



cMyc TGG CGA CGG CAG CGA GGC GGG T GGG GA GGG T GGG/3'Cy3 

133 TGG CGA CGG CAG CGA GGC GGG T GGG TTT GGG TTT GGG/3'Cy3/ 

144 TGG CGA CGG CAG CGA GGC GGG T GGG TTTT GGG TTTT GGG/3'Cy3/ 

155 TGG CGA CGG CAG CGA GGC GGG T GGG TTTTT GGG TTTTT GGG/3'Cy3 / 

1 77 TGG CGA CGG CAG CGA GGC GGG T GGG TTTTTTT GGG TTTTTTT GGG/3 / Cy3/ 

1 99 TGG CGA CGG CAG CGA GGC GGG T GGG TTTTTTTTT GGG TTTTTTTTT GGG/3'Cy3/ 

233 TGG CGA CGG CAG CGA GGC GGG TT GGG TTT GGG TTT GGG/3'Cy3/ 

333(TTT) TGG CGA CGG CAG CGA GGC GGG TTT GGG TTT GGG TTT GGG/3 / Cy3/ 

433 TGG CGA CGG CAG CGA GGC GGG TTTT GGG TTT GGG TTT GGG/3'Cy3 / 

533 TGG CGA CGG CAG CGA GGC GGG TTTTT GGG TTT GGG TTT GGG/3'Cy3/ 

TTA TGG CGA CGG CAG CGA GGC GGG TTA GGG TTA GGG TTAGGG/3'Cy3/ 

TAA TGG CGA CGG CAG CGA GGC GGG TAA GGG TAA GGG TAA GGG/3'Cy3 / 

AAA TGG CGA CGG CAG CGA GGC GGG AAA GGG AAA GGG AAA GGG/3'Cy3/ 

T25 TGG CGA CGG CAG CGA GGC (T) 25 /3'Cy3/ 

Amino 18 nt GCC TCG C/iamino/TG CCG TCG CCA /3'Bio/(annealed to all the 3' Cy3 sequence listed above) 

3 1 3 TGG CGA CGG CAG CGA GGC GGG TTT GGG T GGG TTT GGG 

33 1 TGG CGA CGG CAG CGA GGC GGG TTT GGG TTT GGG T GGG 

5 1 5 TGG CGA CGG CAG CGA GGC GGG TTTTT GGG T GGG TTTTT GGG 

55 1 TGG CGA CGG CAG CGA GGC GGG TTTTT GGG TTTTT GGG T GGG 

7 1 7 TGG CGA CGG CAG CGA GGC GGG TTTTTTT GGG T GGG TTTTTTT GGG 

77 1 TGG CGA CGG CAG CGA GGC GGG TTTTTTT GGG TTTTTTT GGG T GGG 

9 1 9 TGG CGA CGG CAG CGA GGC GGG TTTTTTTTT GGG T GGG TTTTTTTTT GGG 

99 1 TGG CGA CGG CAG CGA GGC GGG TTTTTTTTT GGG TTTTTTTTT GGG T GGG 



of a His6 tag by utilizing the TALON cobalt beads and 
xTractor kit according to manufacturer's (Clontech) in- 
structions with 2x Sigma protease inhibitor mixture, 0.01 
mM PMSF(phenylmethylsulfonyl fluoride) and 15 fxg/ml 
leupeptin added. BL21cell lysates were isolated and bound 
to TALON cobalt (0.5 ml bead volume per 500 ml of 
E. coli culture) resin as recommended by the manufac- 
turer. Cobalt resin was washed three times with ice-cold 
SSC (4x) with (3-mercaptoethanol (0.5 |xl/ml). Recombi- 
nant protein was eluted from resin with three washes of 
0.5 ml of histidine elution buffer (0.7 M histidine, pH 
6.0, 8.6 mM (3-mercaptoethanol, lx Sigma protease in- 
hibitor mixture), followed by one 0.5-ml wash of 200 mM 
EDTA(Ethylenediaminetetraacetic acid) pH 6.0. For the 
second phase of purification, the four elutes were com- 
bined with 1 ml (3 ml total) of 3x Res buffer (lx, 50 mM 
Tris acetate, pH 7.8, 50 mM NaCl, 70 mM glycine, 0.5 
mM MgCl2, 0.012% bovine a-lactalbumin, lx Sigma pro- 
tease inhibitor mixture, 10% glycerol) and bound to CGSPB 
at 37°C for 15 min. Bound CGSPB (C-MYC G4-DNA 
bound streptavidin paramagnetic beads) were washed two 
times in ice-cold SSC (4x) with 0.1% a-lactalbumin and 0.5 
|xl/ml (3-mercaptoethanol. High purity recombinant His- 
tagged G4 resolvase 1 was obtained by ATP-dependent elu- 
tion of CGSPB as described previously (29) except bovine 
a-lactalbumin and Sigma protease inhibitor mixture were 
added to the elution buffer. Purified enzyme stock was 
stored at -80° C. 



Electrophoretic mobility shift assay (EMS A) 

Ten nanomolars partial duplex GQ containing the Cy5 dye 
at junction (Supplementary Table SI) were mixed with 10 
nM of the G4R1 and incubated for a short time (3 min) 
in buffer containing 10 mM Tris-acetate pH 7.8, 50 mM 
KC1, 50 mM NaCl, 0.5 mM MgCl 2 and 10% glycerol. The 
reaction mixture was loaded and run on a 6% acrylamide gel 
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Figure 1. G-quadruplex conformation is distinguished by FRET. (A) The 
GQ containing overhang DNA with Cy3 (green) dye at 3' end and Cy5 
(red) dye. (B) FRET histograms for c-Myc and human telomere DNA. (C) 
CD spectrum of c-Myc and human telomere DNA. 

at 65 V for 2 h with 0.5 x TBE (Tris Borate EDTA) running 
buffer. Gel images were taken with ImageQuant LAS4010 
imager from GE (General Electric). Analysis in Image J was 
used to quantify the percentage binding by taking the area 
of shifted band corresponding to G4R1 bound DNA and 
dividing it by the total area sum of DNA with G4R1 and 
DNA. 

RESULTS 

GQ folding conformation analyzed by FRET 

A 18 -bp partially duplexed DNA with Cy3 (green) dye at 
the y end of the ssDNA overhang and Cy5 (red) dye at 
the eighth position (from the junction) of the complemen- 
tary strand was utilized to monitor GQ folding (Figure 1 A). 
The specific position of the two dyes was chosen to be sen- 
sitive to the differences in GQ folding attributed to parallel 
(Fl), antiparallel (F2) and unfolded states (UF) (17,21,24). 
When in parallel, all four G-triplets are expected to point in 
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the same direction (upward as drawn), resulting in a mid- 
FRET value (~0.55) due to the separation between the two 
dyes. Since the hybrid structure can also give rise to the 
mid-FRET value, CD measurements were utilized to fur- 
ther distinguish the different folding configurations. In the 
antiparallel case, the G-triplets will alternate in directional- 
ity, which will yield high FRET (~0.7) due to the resulting 
proximity between the two dyes (Figure 1A). 

c-Myc and the human telomeric sequence were utilized 
to test if our assay allows for distinguishing parallel from 
antiparallel GQ folding (Table 1). The c-Myc sequence is 
known to only fold into a parallel configuration in 100 mM 
KC1 (30,31). We applied the c-Myc DNA labeled with Cy3 
and Cy5 (Figure 1 A) to single molecule imaging surface for 
FRET detection using total internal reflection microscope. 
(19,32). One field of view yields ~300^400 single molecules 
that gives FRET value. The FRET histogram was built from 
the FRET values collected from over 1000 DNA molecules 
derived from three to four areas of the imaging surface. As 
expected, due to the parallel folding structure of c-Myc, a 
single peak was observed, centered at 0.55 FRET (Figure 

IB, top). In addition, the CD spectrum of c-Myc shows a 
clear positive peak at 260 nm and a negative peak at 240 nm, 
which is a signature of parallel GQ folding (33,34) (Figure 

IC, orange). The 0.55 FRET value coupled with the CD 
data suggest that c-Myc is folded in parallel configuration. 
The CD spectrum of the hybrid GQ is expected to show two 
peaks at 270 and 290 nm (35). This is significantly different 
from the 260 nm peak and 240 nm valley that we observe 
for c-Myc. This verifies that the midFRET peak represents 
parallel, not a hybrid conformation. This criteria is applied 
for the analysis of other GQ conformations below. 

The human telomere overhang GGG(TTAGGG)3 folds 
into mixed parallel-antiparallel conformations (35). The 
FRET histogram shows a major peak at 0.7 with a minor 
peak at 0.55 likely corresponding to the antiparallel and 
parallel folding, respectively (Figure IB). The CD data dis- 
plays a peak at 295 nm with a small shoulder peak at 260 
nm, thus suggesting the antiparallel fold as the major con- 
formation (Figure ID, blue). The complex conformations 
that can result from the human telomere GQ folding has 
been reported in numerous structural studies (35-38). Al- 
though our results do not pinpoint the exact GQ folding 
motif, based on the FRET value, CD data and reports from 
previous studies (17,21), our data is consistent with the par- 
allel (Fl, mid FRET) and antiparallel (F2, high FRET) 
populations. Our assignment of FRET value to the GQ con- 
formation is the same as Ying et al (24) and Lee et al (17), 
but different from Ray et al (21). This most likely arises 
from the presence of a flanking DNA sequence utilized in 
the latter study. The smFRET approach enables us to quan- 
tify the individual conformations exhibited by varying GQ 
forming sequences. 

Loop length-dependent GQ folding 

Loop length is one of the major determinants of the GQ 
folding pattern (14). Recently, it has been shown that several 
G-rich sequences in oncogenic promoters form stable GQ 
structures (5), often consisting of various lengths of loops. 
Previous studies have reported the effects of loop length 




133 GGGTGGGTTTGGGTTTGGG 1« GGGTGGGTTTGGGTTTGGG 

144 GGGT GGGTTTT GGG TTTT GGG 233 GGGTTGGGTTTGGGTTTGGG 

155 GGGT GGG TTTTT GGG TTTTT GGG 333 GGGTTT GGG TTT GGG TTT GGG 

177 GGGT GGG TTT TTTT GGG TTTT TTT GGG 433 GGG T TTTGGG TT T GGG TT T GGG 

199 GGGT GGGTTTTTTTTT GGG TTTTTTTTTGGG S33 GGG TTTTT GGGTTT GGGTTT GGG 



Figure 2. GQ conformation modulated by loop length variation. (A) 
FRET histograms of 133, 144, 155, 177 and 199 DNA. (B) CD spectrum 
of 133-199 DNAs. (C) FRET histograms for 133, 233, 333, 433 and 533 
DNA. (D) CD spectrum of 133-533 DNAs. (E) Fraction of parallel, an- 
tiparallel and unfolded GQ conformations for all DNAs tested. 



in GQ folding, through ensemble CD, UV melting, fluo- 
rescence measurements (12,19,39) and molecular dynam- 
ics simulations (40). Here, we systematically varied the loop 
lengths to quantitatively measure the effects in GQ confor- 
mations. Initially, we fixed the first loop at one base and var- 
ied the second and third loop from three to nine, notated as 
133, 144, 155, 177 and 199, respectively (Figure 2A). The 
resulting FRET histograms from these constructs show a 
single peak at 0.55 FRET, indicating a parallel folding con- 
formation. 177 and 199 exhibit an additional lower FRET 
peak, likely representing an unfolded (UF) population of 
molecules, consistent with the previous findings (21). CD 
measurements also agree with these findings, a peak pattern 
reflecting a parallel folding for all the GQ sequences (sharp 
peak at 260 nm coupled with a negative peak at 240 nm) 
is also seen. The peak width is broader for 199, likely af- 
fected by the unfolded molecules (Figure 2B). This data set 
strongly suggests that a single nucleotide in one loop has 
a dominant effect of inducing a parallel GQ folding. It is 
surprising that the 199, which exceeds the requirement of 
[G 3 Ni 7 G 3 Ni 7 G3N1 7 G 3 ], still folds into parallel, likely 
governed by the presence of a single base loop (10). Fur- 
ther, the position of the single nucleotide and its propensity 
to induce parallel folding was explored by moving the single 
nucleotide to the middle and third loop. In both the middle 
positions (313, 515, 717 and 919) and third positions (331, 
551, 771 and 991), we observe that all DNAs exhibit a clear 
peak at 260 nm in CD measurement, signifying formation 
of parallel folding in all DNAs regardless of the single nu- 
cleotide position (Supplementary Figure SI). 

Next, the number of bases in the first loop was varied 
from one to five, while keeping three bases in the other two 
loops. These constructs are named, 133, 233, 333, 433 and 
533 (Figure 2C). As we lengthen the first loop from one 
(133) to two (233) and three (333), a high FRET peak at 0.75 
emerged. This indicates the transition from completely par- 
allel to a mixed parallel and antiparallel conformation. To 
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name sequence 

TTT GGGTTT GGGTTT GGGTTT GGG 
TTA GGGTTA GGGTTA GGGTTA GGG 
TAA GGGTAA GGGTAA GGG TAA GGG 
AAA GGG AAA GGG AAA GGG AAA GGG 

Figure 3. GQ conformation controlled by loop sequence. (A) FRET his- 
tograms for TTT (333), TTA, TAA and AAA. (B) CD spectrum of TTT, 
TTA, TAA and AAA. (C) Fraction of parallel, antiparallel and unfolded 
GQ conformations for all DNAs tested in (a) 

quantify this effect, we calculated the area under the Gaus- 
sian fitted curve corresponding to parallel (P), antiparallel 
(AP) and unfolded (UF) conformations (Figure 2E). De- 
spite one nucleotide difference between 233 and 333, the 
parallel conformation is significantly higher for 233 (91%) 
than 333 (56%), suggesting a sharp transition in folding en- 
ergetics that partitions 233 from 333. Beyond, the 433 and 
533 exhibit a mixture of parallel and antiparallel combined 
with an increasing fraction of unfolded conformation, likely 
due to the longer loop lengths. We note that the FRET 
values for the UF population are different due to the to- 
tal length of single stranded DNA in different DNA con- 
structs. For example, 433 and 533 have total length of 22 
and 23 nucleotides (nt) which will yield higher FRET (~0.3 
FRET) values than the 177 and 199 which has 27 and 31 nt 
(~0.2 FRET), respectively. CD measurements corroborate 
with the FRET results. From 133 to 533, a positive peak at 
260 nm diminishes while 295 nm peak increases, confirm- 
ing the decrease in parallel (mid FRET) and increase in an- 
tiparallel (high FRET) GQ (Figure 2D). However, the un- 
folded conformation cannot be deduced from the CD, since 
it only reports on the existence of parallel and antiparallel 
state. Taken together, this shows quantitatively how the loop 
length influences the GQ folding. The longer loop lengths 
promote antiparallel and unfolded states while the presence 
of one-base loop dominates the folding into parallel con- 
formation even when neighboring loops are as long as nine 
bases. 



Loop sequence dependent GQ folding 

In order to study the role of loop composition in GQ fold- 
ing, several derivatives of the 333 construct were prepared. 
Four different sequence compositions (TTT, TTA, TAA 
and AAA) were tested (Figure 3 A). As shown, TTT (333) 
contains mixture of parallel (56%) and antiparallel (44%) 
conformations (Figures 2C and 3 A). A diminishing popu- 



lation of parallel folding was observed upon adenine (A) 
replacing thymine (T) in the 333 length constructs. The par- 
allel conformation for TTT, TTA, TAA changed from 56 
to 35% and 16%. When all thymines are replaced with ade- 
nine as in the AAA construct, a majority of the molecules 
exhibited a low FRET value corresponding to the unfolded 
state (Figure 3B). The FRET value for unfolded AAA (0.4 
FRET) is higher than that of 433-533 (0.3 FRET) and 177- 
199 (0.2 FRET). This can be attributed to the shorter ss- 
DNA length of 21 nucleotide as well as a possible helical 
secondary structure stabilized by ApG or GpA (41). These 
decreasing parallel population is supported through CD ex- 
periments, which show a decrease in the 260 nm peak with 
increasing adenine content (Figure 3C). The 255 nm peak 
observed for AAA is likely due to the unfolded GGGAAA 
repeats (42). This unexpected sequence-dependent effect 
may be explained by difference in steric hindrance imposed 
by adenine and thymine in these sequences. Adenine, as a 
purine, bears two carbon-nitrogen ring unit which is sub- 
stantially larger than the thymidine with one ring. When 
in a loop confinement, the adenine bases may experience 
a greater degree of steric hindrance than the thymines, dis- 
favoring the tight packing of G-triplets in parallel confor- 
mation. 

Initial folding conformation and kinetics 

Single molecule FRET traces were analyzed for all the con- 
structs tested above in order decipher differences in initial 
folding and subsequent kinetic behavior. To capture the mo- 
ment of GQ folding, DNA substrates were incubated in 
buffer devoid of any cations (10 mM Tris-HCl pH 7.5). This 
results in no GQ formation. Potassium-containing buffer 
(100 mM KC1, 10 mM Tris-HCl pH 7.5) was introduced 
while monitoring the smFRET signal change. Real-time 
smFRET traces analysis indicated that all sequences ini- 
tially exhibit low FRET (0.25-0.35), which is expected due 
to the lack of folding in cation-free buffer conditions. When 
the potassium buffer is introduced (demarcated by a red ar- 
row), 233-533 constructs initially fold into an antiparallel 
(0.75 FRET) conformation, while all the one loop length se- 
quences (133-199) fold directly into the parallel state (0.55 
FRET) within a time resolution of 100 ms (Figure 4A and 
B). Initial flow single molecule traces allowed for the calcu- 
lation of the rate of initial folding by taking the dwell time 
between the moment of KC1 buffer flow and the moment of 
FRET increase (Figure 4A and B). The results represent a 
sampling of over 200 molecules for each condition. For 233- 
533 DNA, the shortest looped constructs, 233 displayed the 
fastest folding to antiparallel, followed by 333, TTA, TAA, 
433 and 533 (Figure 4C). Likewise, 133 exhibited the highest 
folding rate to parallel conformation followed by 144, 155, 
177 and 199. We note that of the rates of short constructs, 
133 and 233 are likely underestimated due to the time delay 
expected from equilibration after the manual buffer flow in 
our system. This may contribute to increased heterogeneity 
seen in the folding rates of 133 and 233 (Figure 4C and D). 
The folding rate between the fastest (133) and slowest (199) 
differs by more than one order of magnitude according to 
these measurements. This result signifies that the shorter 
loop length induces faster folding kinetics. 
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Figure 4. Kinetic analysis of initial GQ folding. (A) Representative sm- 
FRET traces of initial folding upon KC1 addition for 233-533 DNAs. (B) 
smFRET trace of initial folding for 133-199 DNAs. (C) Initial folding rate 
of 233-533 DNAs. (D) Initial folding rate of 133-199 DNAs. 



all three states. Although, a small fraction of molecules ex- 
hibit a long-lived folded states in one specific conforma- 
tional state (Supplementary Figure S2B). This observation 
is consistent with the previously reported findings on the 
human telomere overhang (17) and Tel23 sequence (43). 
Dwell times of individual FRET transitions representative 
of the three states and the corresponding six set of kinetic 
rates were calculated (Figure 5B). For example, the rate at 
which a parallel state interconverts to an antiparallel state 
is notated as 'P_AP\ Similarly, 4 P_U' refers to the rate at 
which parallel (mid FRET) folding transitions to an un- 
folded state (low FRET). Most rates lie within the 0.1-0.2 
s _1 range (Figure 5C), while substantially faster rates are 
observed for U_AP and LLP of two constructs, TTA and 
TTT (333). These rates suggest that TTA and TTT fold into 
parallel conformation five to six times faster than TAA, 433 
and 533. This unusually high folding rate obtained for TTA 
points to an inherent property built into the human telom- 
eric DNA. This property may have further biological impli- 
cations. 
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Figure 5. Kinetic analysis of folding-unfolding dynamics of GQ. (A) Rep- 
resentative smFRET traces that display folding-unfolding kinetic. (B) Ki- 
netic rates of folding and unfolding for all DNAs tested. (C) Bar graph of 
all kinetic rates. (D) Diagram showing three state transitions in GQ folding 
pathway. 



Kinetics of conformational exchange 

After the initial folding, all short looped constructs (133, 
233) as well as 144-199 displayed a constant 0.55 FRET, 
thus, reflecting a stable nature of the parallel folding (Sup- 
plementary Figure S2A). Despite the minimal high FRET 
peak observed for 233 (Figure 2B), the inter-conversion 
from mid FRET (0.55) to high FRET (0.75) occurs very 
infrequently, rendering the dwell time collection difficult 
for these DNAs. In contrast, dynamic folding behavior is 
observed for TTT (333), TTA, TAA, 433 and 533 con- 
structs (Figure 5A). There are three FRET states in dynamic 
exchange; antiparallel (0.75), parallel (0.55) and unfolded 
(0.2-0.3) for all five DNA constructs. The majority of the 
FRET traces exhibited highly dynamic transitions between 



Conformation specific binding of NMM, NMP and G4 re- 
solvase 1 

NMM (TV-methyl mesoporphyrin IX) is one of the first 
small molecules reported to bind GQ DNA (44). Subse- 
quent work has further demonstrated its high specificity 
for binding GQ DNA (45-47). Such selective binding of 
NMM to GQ was employed in diverse chemical and bi- 
ological screening efforts (46,48). NMM binding was also 
shown to inhibit GQ unwinding by RecQ and BLM helicase 
(47,49). More recently, a study by Nicoludis et al. showed 
that NMM selectively binds parallel conformation of hu- 
man telomere sequence (30). We applied NMM and its ho- 
molog, NMP to all the GQ DNAs previously investigated 
to test if their binding propensity displays specificity toward 
the parallel conformation. 

When we applied NMM to GQ DNAs labeled with Cy3 
labeled at the 3' end, we observed immediate quenching 
of the fluorescence in ensemble measurements (Figure 6A). 
The quenching level corresponds to the previously observed 
patterns of parallel GQ conformation. Thus highly paral- 
lel sequences exhibited a large degree of quenching, while 
the highly antiparallel sequences displayed low quenching. 
We obtained the percentage quenching by taking an inverse 
of the fluorescence reading as a function of NMM con- 
centration (Figure 6B). As stated previously, most paral- 
lel DNAs including c-Myc, 133 and 233 displayed highest 
degree of quenching whereas the least parallel DNAs such 
as TTA and TAA exhibited substantially lower quenching. 
This suggests a selective binding of NMM to parallel GQ. 
As a negative control, T25 (25 nt deoxy-thymidine) was 
tested, and this construct resulted in the lowest quench- 
ing value. This minimal quenching represents a nonspe- 
cific interaction of NMM with the DNA. We performed 
the same experiment with NMP and obtained quenching 
signal similar to the NMM (Supplementary Figure S3 A). 
To quantify the parallel specificity of NMM and NMP, we 
plotted the percentage parallel GQ obtained by FRET (or- 
ange) with the maximal percentage quenching by NMM 
(blue) and NMP (green) after subtracting the T25 signal 
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Figure 6. GQ ligand and G4 resolvase binding. (A) Fluorescence quench- 
ing assay. (B) The percentage quenching observed for all GQ DNAs. (C) 
Bar graph plot of percentage parallel conformation (orange) and per- 
centage quenching obtained for NMM (blue) and NMP (green) induced 
quenching. (D) Schematic of G4R1 binding assay. (E) EMSA result of 
G4R1 binding to GQ DNAs. (F) Bar graph plot of percentage parallel 
conformation (orange) and percentage G4R1 binding (purple). 



as a background (Figure 6C). The highly correlated val- 
ues between the percentage parallel conformation and per- 
centage quenching strongly suggest that NMM and NMP 
both bind specifically to parallel GQ. From the quenching 
curve, we obtained the binding dissociation constant, for 
NMM, NMP and other GQ ligands including NMMDE 
and BRAC019 (Supplementary Figure S3B). In agreement 
with the concentration dependent quenching, NMM and 
NMP displayed a low K<± (~0.1 |xM) for highly parallel GQ 
and high K& (10-500 |xM) for less parallel GQ constructs. 
NMMDE and BRAC019 showed less specificity. A macro- 
cyclic drug was also tested which showed substantially less 
specificity of binding to parallel GQs (Supplementary Fig- 
ure S3C and D). 

The GQ resolving protein, G4 resolvase 1 (G4R1) was 
tested to establish if any conformation bias exists with re- 
spect to which GQ conformation it will bind. The resolvase 
activity of G4R1 was first identified and characterized by 
Harrington et al. (50) and was later shown to require 3' ss- 
DNA overhang for loading and unwinding of unimolecular 
GQs (28). Based on these substrate requirement, we pre- 
pared GQ DNA with 1 5 nt ssDNA overhang to allow G4R1 
binding (Figure 6D). We applied purified G4R1 (10 nM) 
to the GQ DNA constructs and performed EMSA (elec- 
trophoretic mobility shift assay). This allowed for the vi- 
sualization and quantification of protein binding to each 
GQ sequence (Figure 6E). The band intensity of the G4R1 
bound and unbound DNA was quantified allowing for the 
calculation of bound fraction. The percentage G4R1 bind- 
ing (purple) was plotted against the percentage parallel GQ 
(orange) (Figure 6F). This result suggests that G4R1 bind- 
ing is highly correlated with the parallel percentage, indicat- 
ing that G4R1 selectively binds parallel GQs. 



DISCUSSION 

In this work, we employed the smFRET method not only 
to distinguish, but to quantify the distinct conformational 
state that arises from varying loop size and sequence of GQ 
forming DNA. FRET value alone is not sufficient to report 
on the exact conformation of GQ folding, thus CD mea- 
surement and NMM ligand binding were utilized as com- 
plementary results. The added advantage of CD measure- 
ment to this method is the distinct signature produced by 
the parallel GQ conformation that encompasses a sharp 
peak at 260 nm and a valley at 240 nm. We observed these 
features for all the GQs that possess single nucleotide loops 
including c-Myc, 133, 144, 155, 177 and 199. Each of these 
sequences also displayed a FRET peak at 0.55. The CD 
spectrum for other GQ folding sequences does not provide a 
clear observation, as it is difficult to distinguish population 
distributions when longer loop lengths induce unfolded or 
less structured GQ molecules. 

Real time single molecule FRET traces enabled kinetic 
analysis of the GQ as they undergo initial folding and 
the subsequent dynamic behavior. The results suggest that 
the initial folding rate depends heavily on the overall loop 
length of the GQ DNA. 133 and 233 which possess seven 
and eight nucleotides in total loop lengths exhibit ~2 s _1 . 
The total loop size of 9-10 nt shows a substantially re- 
duced rate of 0.5-0.8 s _1 . When the total loop exceeds 11 
nt, the folding rate is diminished to below 0.1 s _1 . This dra- 
matic decrease in folding rate as a function of total loop 
length may have implications in the likelihood of folding 
and unfolding of potential GQ forming sequences in ge- 
nomic DNA. Short looped GQ may form more readily and 
persist longer compared to long loop GQs. 

The kinetic folding transitions amongst all three confor- 
mations (parallel, antiparallel and unfolded) showed inter- 
esting relationships. It was observed that the GQ sequences 
TTT and TTA displayed faster rates of folding when com- 
pared to the other GQ sequences tested. In the case of the 
latter sequence, which is found in human telomeric over- 
hang, this may provide an interesting biological role. The 
minimization of the time spent in the unfolded state agrees 
with previous single molecule studies suggesting fast fold- 
ing kinetics in the context of long telomeric repeats (51). 
Such a mechanism may help prevent end to end fusion that 
disrupts the genomic integrity. 

Here, we studied human telomere sequence in the con- 
text of loop sequence variance. However, human telomere 
conformation is complex as the structural dynamics are ex- 
tremely sensitive to its neighboring sequence at either the 
y or 5' side (18,36,38). It was also shown to exhibit di- 
verse structures such as hybrid, parallel 1, parallel, bas- 
ket type structures in the presence of the molecular crowd- 
ing reagent, PEG (52,53). Here, we focused on the loop 
size and sequence dependence rather than the complexity 
arising from the variable sequence arrangement of human 
telomere. 

NMM was shown to bind specifically to parallel GQ 
formed in human telomeric DNA (31). Upon binding, both 
the NMM and NMP quenched the fluorescent dye attached 
to the y end of GQ DNA. This photochemical effect en- 
abled us to quantify the binding propensity and affinity of 
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the ligands. The degree of quenching exhibited for each GQ 
DNA showed a high correlation to the parallel GQ forma- 
tion estimated from the FRET histogram analysis. This fur- 
ther demonstrates that the mid FRET level (0.55) that is 
observed likely represents the parallel fraction of GQ con- 
formation. G4R1 loading preference toward parallel DNA 
points to the possibility that the parallel GQ may be selec- 
tively resolved by an enzyme such as G4R1. Additionally, 
this suggests that parallel GQs are important structures that 
require enzymatic resolving in the context of the genome. 
Additional studies are needed to determine if a helicase is 
needed to resolve antiparallel GQ conformations. Further, 
an interesting area has developed in identifying the binding 
location of G4R1 in genomic DNA in light of the recent 
finding that human helicases, XPB and XPD, are highly en- 
riched at GQ forming regions near the transcription start 
site (54). 

Our results may be helpful in predicting the possible GQ 
folding landscape of the promoter. The potential GQ fold- 
ing sequences in these areas show a large variation in both 
composition and loop length. Thus, our systematic analysis 
of composition and loop length dependent properties (fold- 
ing conformation, initial folding rates and transition ki- 
netics) will prove beneficial in identifying potentially stable 
folded GQs in this diverse landscape (31,55). The smFRET 
platform utilized throughout this study can be extended to 
examine specific promoter sequences and to screen for GQ 
targeting ligands. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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